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<~| ■ Abstract. We study by computer simulation the "Hawkes process" that was 

proposed in a recent paper by Crane and Sornette (Proc. Nat. Acad. Sci. USA 
O ' 105, 15649 (2008)) as a plausible model for the dynamics of YouTube video viewing 

O ■ numbers. We test the claims made there that robust identification is possible for classes 

of dynamic response following activity bursts. Our simulated timeseries for the Hawkes 
Q ' process indeed fall into the different categories predicted by Crane and Sornette. 

' 55 ■ However the Hawkes process gives a much narrower spread of decay exponents than 

the YouTube data, suggesting limits to the universality of the Hawkes-based analysis. 

£?' 

^ I PACS numbers: 89.75.Fb, 89.20.Hh, 05.40.-a 

> 

\0 . 1. Introduction 

00 

Recently pQ , Crane and Sornette analysed the viewing of YouTube videos as an example 
of a nonlinear social system. They identified peaks in the timeseries of viewing figures 



for around half a million videos and studied the subsequent decay of the peak to a 
background viewing level. A self-excited Poisson process, or Hawkes process [2], was 
proposed as a model of the video-watching dynamics, and a plausible link made to 
the social interactions that create strong correlations between the viewing actions of 
different people. Individual viewing is not random but influenced by various channels 
of communication about what to watch next. 

The Hawkes process has a Poisson distributed number of views, with an 
instantaneous rate given by 

\(t) = ri(t)+ £ ia<Kt-ti). (l) 
{ti<t} 

Here rj is a noisy source term (allowing viewing to occur even for a completely dormant 
video, for instance) and the summation describes how past viewing events at times 
influence the current event rate. The coefficient \ii is the number of potential viewers 
influenced directly by person i who viewed a video at time U; the function <fi describes 
the waiting time distribution for those influenced, between trigger and response. (Put 
differently, this is the distribution of waiting times between finding out about a particular 
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video and actually viewing the video.). If 4>(t) is a power- law memory function as used 
in this work, the resulting process is also known as an ETAS model [3] , and can be used 
to model earthquake aftershocks. 

On the basis of previous work [3], Crane and Sornette chose a long-memory waiting 
time distribution 

0(t)~r( 1+9) 9e{0,i). (2) 

For fixed 9 (we address variability in 9 later on), the behaviour of a timeseries generated 
by such a Hawkes process then depends on the distribution of fi. There are four separate 
dynamical classes, two if a viewing shock happens from an external stimulus, two from 
internal dynamics [5] . In this paper we address only the dynamics of externally shocked 
timeseries, but for completeness all four dynamic classes will be outlined below. 

In each dynamic class, there is a different prediction for the power-law decay of the 
activity level following an initial shock. The power laws involved are quite distinct for 
each class, and predicted by [5j to depend solely on 9, whereas the statistics of \i control 
solely which class one is in. Therefore, if Eq.(J2]) were really to hold (with a unique 9) one 
might naively expect the distribution of power law exponents observed in the data to 
collapse onto a set of discrete delta-functions, one for each dynamic class. On reflection, 
however, this cannot be correct since an individual activity burst represents a sequence 
of discrete events which (unless the total number of these is enormously large) is unlikely 
to be fully self-averaging for the purposes of fitting a power law to the long-time decay. 
In practice for the YouTube data [1] the distribution of fitted exponents is very broad 
with, at best, bump-like features at the expected discrete exponent values. Crane and 
Sornette get around this by using a quite separate method (detailed below) to classify 
the dynamic class of each burst, and then showing that the subdistributions for each 
class are unimodal with modal values close to the expected one for that class. The 
overlapping exponent distributions that necessitate this procedure do however call into 
question the announced robustness [1] of the dynamic classes inferred from the Hawkes 
model. 

In the present work we perform simulations that shed light on how much of this 
exponent variability can be expected to arise from the Hawkes process itself. Any 
variability beyond this level in the YouTube data is evidence that Eq.(j2]) does not really 
describe the social dynamics of YouTube. Of course, nobody would expect this dynamics 
to be captured exactly by the Hawkes process; however, behind the concept of robust 
dynamic classes in [1] lies a broader notion of universality. For instance in equilibrium 
critical phenomena, a very simple model (the Ising model) captures to arbitrary accuracy 
the universal features of a wide class of phase transitions involving order parameters of 
the same symmetry. In the wider context of nonequilibrium criticality, the universal 
status of simple models is much less well established, and deserves detailed attention. 
Our simulation results suggest that this universality may be somewhat limited, at least 
if one is interested in the distribution of fitted exponents for individual activity bursts 
within each dynamic class. 
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In what follows we first classify all four dynamic regimes before presenting the 
analysis of our results. 

2. External shock 

In this regime, the viewing rate is first dominated by the rj term in Eq.(j2]). At some 
time, t , the video gains widespread public attention. (It might be featured in a national 
newspaper, or on some high-traffic website; or it may relate to a famous person whose 
death is suddenly announced.) This produces a spike in the viewing figures which then 
decay away. The form of the decay depends on the distribution of \x. 

• If (/i) = 1, a cascade of viewing events occurs and the timeseries decays from the 
shock like ~ t~^~ e \ This is termed a critical decay. 

• If (/i) < 1, only first generation viewing events are important (i.e., those stimulated 
by the external source) and the timeseries decays like ~ £^ 1+61 ). This is a subcritical 
decay. 

3. Internal shock 

A particular series of viewing events can lead to an internally created maximum in the 
timeseries (above that expected for a Poisson noise process). This internal shock has 
a different decay exponent again from the externally induced peaks. The two internal 
dynamic classes are: 

• A simple noise process if (n) < 1; no coherent peak arises. 

• A peak grows and decays like ~ t"^ 1-26 '); this occurs if (//) = 1. 

4. Classification and exponent values 

If this model is correct for the dynamics of video views, it should be possible to identify 
the different dynamic classes by finding peaks in the viewing timeseries and then fitting 
a power law to the subsequent decay. These power laws should form a distribution 
which arises as the merger of the various classes; if the individual activity bursts can 
separately be classified, the subdistribution for each class can be extracted. Crane and 
Sornette perform such an analysis and by fitting to the modal exponent for each class 
infer a value for the exponent 9 in Eq.(j2]) of 9 ~ 0.4. We therefore create artificial 
timeseries with (9) = 0.4 for best comparison with their data. With this choice, we 
expect to extract decay exponents (recalling that we only study the externally shocked 
case) of 

• /3 SC = 1 + 9 = 1.4 and 

• /3 C = 1 - e = 0.6 
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As mentioned previously, the model might lead one to expect 5-function peaks in 
a PDF of decay exponents, corresponding to the various dynamic classes. The data 
presented in pQ show weak peaks at these values, but with a significant spread. In 
particular, some of extracted exponents would imply values of 9 that lie outside the 
range < 9 < 1 required by the model itself. With the help of our simulation data, we 
can look at whether the spread arises through miscategorisation; a poor fitting method; 
fluctuations in the fitted exponents due to noise inherent in the Hawkes process itself; 
or failure of the Hawkes model to accurately describe the YouTube data. 

5. Generating the synthetic data 

We carry out a discrete-time simulation of the Poisson/Hawkes process, restricting 
attention to activity bursts initiated by external shocks. (We generally take each initial 
shock to comprise Nq = 5000 views.) We choose to generate a random number of views 
from a Poisson distribution with given mean at each timestep and update the rate 
accordingly afterward. Effectively, we treat the continuously varying X(t) as a constant, 
generate a given number of events and then modify A according to Eq. (JT]). We must 
also choose what values 9 and \i can take. Additionally, we need a normalisation for the 
distribution of waiting times, <f>. Following pQ we take this distribution normalized to 
unity (so that all those 'influenced' to watch a video by a particular viewing event do 
watch it eventually). Remembering that the waiting time will be an integer (due to our 
simulation strategy), we have 

with C the Riemann (^-function. This ensures 

oo 

£*(*) = !■ (4) 
t=i 

Our algorithm for generating the synthetic data is therefore as follows: 

(i) Shock the system by creating N(0) = Nq initial viewing events. 

(ii) For each viewing event, generate the number of subsequent viewers fa by sampling 
from P(/i). At time t we therefore seed n = Ylf=i Mi future viewing events. Each 
of these n future viewers has their own decay constant 9^ drawn from P(6). 

(iii) Generate a Poisson event rate A(£) by summing over the past history according to 
Eq.CJ 

(iv) Use this rate to generate the number of viewing events N(t) between time t and 
t + St. 

(v) Increment t by 5t and repeat steps (ii) to (v) until the maximum specified time has 
been reached. 

The model analysis in [1] assumes that all 9 values are equal. It is, however, not 
clear that all interactions would involve exactly the same response kernel. If those 
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influenced have a distribution of waiting habits, this can safely be averaged unless there 
is a correlation with the person exerting the influence (so that 9 varies between the 
events i in Eq.(j2J)). To allow for the latter possibility we carry out simulations not 
only with a single 9 = 0.4 but with a random distribution of 9^ to see if this modifies 
the results. For the latter we choose 9 from a truncated Gaussian with mean 0.4 and 
standard deviation 0.2 (restricting the support to 9 e (0, 1)). 

Finally we need to choose the statistics of yUj. We will see that the particular choice 
of distribution does not make an appreciable difference to the results for the externally 
shocked system (although as detailed above, the value of (//) is important). Here we 
present results where \x is drawn from appropriately weighted ^-function distributions 
as well as Poisson distributions. 



6. Fitting the data 

We estimate decay exponents from the artificially produced timeseries both via the 
method described in jTJ and using a maximum likelihood estimator. The least squares 
estimator used in jTJ can give incorrect parameters [6] since some of the assumptions 
behind it are violated for power law decays. However, in our study we find little 
difference between the maximum likelihood decay exponents and the least squares 
exponents, which is evidence that errors in the exponent estimation method used in 
[TJ are not the main cause of the large spread of observed exponents. 



6.1. Maximum likelihood estimator 

Each post-shock timeseries decay has two free parameters once the peak has been 
identified: the decay exponent ft, and the time at which the peak has decayed to the 
background noise level t max - To construct the maximum likelihood estimator (MLE) 
for ft, we assume the data to be independent identically distributed random variables 
drawn from a discrete power law distribution, P(t,ft). That is, with a peak occurring 
at t = 0, we expect, for t = 1, . . . , t max 

P(t,ft) = jf^- (5) 

where 

tmax 

flw = E k-e (6) 

k=l 

is a generalized harmonic number. 

For every possible £ max we find the best fit value of ft for this distribution using 
a maximum likelihood estimator. For a timeseries A(t), our dataset consists of A(tj) 
observations at each time t{. Each of these ti's has an individual likelihood given by 
P(ti,ft). We assume each observation of £j is independent and so the likelihood of the 
dataset factorises into the product of the individual likelihoods 

c{ft)= n p{u,p). (7) 
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To find the best fit value of /3 for a given dataset, we maximise the likelihood with 
respect to (3. (In fact, since the likelihood is such a small value, we instead maximise 
the logarithm of the likelihood, but this gives the same result.) To find the best value of 
£ max we follow the method of [6] and choose that i max which minimises the Kolmogorov- 
Smirnov distance statistic. That is, for each value of t max we find the best fit decay 
exponent and calculate 



where E(x) is the empirical cumulative distribution function, and C(x) the best-fit- 
hypothesis cumulative distribution function. We then pick t max as that value which 
minimises D. (Note that this fitting method, while finding the best fit, tells us nothing 
about the quality of that fit.) 

6.2. Least squares estimator 

We also calculate the decay exponents for the same timeseries using the method 
described by Crane and Sornette pp. This uses a least squares regression on the dataset 
to find the decay exponent from the peak over a fixed window size. For each fit, they 
look at the distribution of the relative residuals, i.e., the difference between the model 
and the empirical data, divided by the expected value at that point. If the relative 
residuals are not distributed normally, the fit for that window size is rejected. The best 
fit to the data is chosen to be the largest window size with normally distributed relative 
residuals. Following [1] we reject the fit if the hypothesis of a normal distribution is 
violated at the 1% level using a \ 2 test- 

6.3. Fitting to an ensemble average 

The individual timeseries that we generate are subject to a reasonable amount of noise 
giving a spread of best fit decay exponents. Given that we control the time and size of 
the initial shock, we can easily obtain better statistics for the different parameter choices 
by considering ensemble averaged timeseries. This allows us to observe the behaviour 
of the decay for a longer time and get a better fit for the decay exponents. 

The fitting method in this case is as before; we obtain the best fit j3 value by 
maximising the likelihood of the data. The decay now occurs over the whole tail of the 
timeseries and so we do not need to find t max ; we can set it manually as equal to the 
largest time in our dataset. We fit both from the peak of the decay and the 'tail'. To 
determine where the tail of the data starts, we follow the same procedure as detailed 
above for finding £ max , only this time we apply it to find t min . That is, for each t min 
value, we calculate D (Eq. EJ) of the best fit and subsequently choose as our lower cut-off 
that t min which minimises D. We obtain errors on our estimates of by noting that our 
MLE is asymptotically optimal [7j, for TV observations, the variance in the estimated 
value is therefore given by the inverse of the observed Fisher information [8] 



D = max\E(x) - C(x)\ 



(8) 



Var(/3) 



1 



(9) 



NJ(P) 
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with 

which can easily be obtained numerically. The MLE is asymptotically Gaussian with 
mean f3 and variance given by Eqj9] and so confidence intervals are just the standard 
Gaussian ones. 



7. Results 



To begin, we look at the behaviour of the ensembled-averaged timeseries. As expected, 
we see a clear distinction between the subcritical decay (where (/i) < 1) and the critical 
decay (with (fj) = 1). The best fit decay exponents are also those expected (figured]). 
The difference between critical and subcritical decays remains when we draw 9 from a 




Figure 1. Ensemble average decay exponents with P(9) = 5(6 — 0.4) and P(n) as 
indicated. Each dataset is the average of 700 realisations with an initial shock of 5000 
views. Lines show the best fit decay exponent in the tail of the decay (t > 8 for 
critical and t > 344 for subcritical decay) obtained from the MLE, ± figures are 95% 
confidence intervals whose calculation is detailed in the text. The decay exponents are 
in good agreement with the theoretical values of 0.6 and 1.4. 

Gaussian distribution (figure |2]). We do, however, notice a significant difference from 
the single-valued 9 case: the numerical values of the decay exponents no longer agree 
well with the predicted values. 

Notice how the subcritical decay appears to exhibit a crossover between a short 
time "critical" decay exponent and long time subcritical decay. Increasing (fi) towards 
the critical value of unity moves the crossover to later and later times. Interestingly, 
when 9 is drawn from a Gaussian distribution, both the critical and subcritical decays 
exhibit some sort of crossover behaviour, not seen in the critical decay for single-valued 
9. This crossover can contribute to the spread of exponents in the subcritical case, since 
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Figure 2. Ensemble average decay exponents with P(9) — iV(0.4,0.2) and P(/x) as 
indicated. Each dataset is the average of 700 realisations with an initial shock of 5000 
views. Lines show the best fit decay exponent in the tail of the decay (t > 114 for 
critical and t > 14 for subcritical decay) obtained from the MLE, ± figures are 95% 
confidence intervals. The decay exponents are no longer in good agreement with the 
theoretical values of 0.6 and 1.4. 



the fitting mechanism may pick up the early time decay. The crossover observed is 
discussed in detail in the context of the ETAS model in [31 [9] . 

We now look at the distribution of decay exponents of individual timeseries obtained 
from both the MLE and least squares estimator. Our results for the ensembled averaged 
timeseries indicate that we will likely not see the asymptotic exponent in the subcritical 
case if we fit the entire post-shock timeseries (t m i n = 0), as the decay exponent will be 
skewed by the early time 'critical' decay We therefore show results with £ min = 0, 10, 20 
and 30; these latter fits will give us an indication of what the tail exponent looks like. 
The results for a single value of 9 are shown in figure those with 9 from a Gaussian 
distribution are shown in figure HI 

As expected, fitting the entire post-peak timeseries underestimates the subcritical 
decay exponent. Both fitting methods pick up the early time decay, which is slower; once 
the early time peak is ignored, the decay exponents are more similar to the tail seen in the 
ensemble- averaged case. We note that the fits for t min ^> 1 do have quite a large spread 
of exponents. This is due to poor statistics in the tail of the decay: the fluctuations are 
large enough that we occasionally pick up a highly anomalous decay exponent. This 
form of statistical noise appears to be intrinsic to the Hawkes process once the data is 
filtered by £ min . Improved fits, and presumably also narrower distributions of the fitted 
exponents, would arise if we used much larger initial shocks (No ^ 5000). 

Our choice of N is however consistent with the statements in |T] of mean total 
views in the tens of thousands (with at least 20% of these viewed on the peak day) 
for the shocked case. It is intended to give a realistic estimate of the intrinsic noise in 
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Figure 3. Histograms of extracted decay exponents for critical and subcritical 
timcscrics and P(6) = 5(9— 0.4), initial shock is 5000, 700 realisations. Grey histograms 
show exponents obtained from non-linear least squares fitting, white histograms are 
obtained from MLE fits. Note how there are two distinct peaks in the distribution, 
corresponding to critical and subcritical decays. The subcritical peak moves from 
/3«lto/3~1.4 when we avoid picking up the early time critical decay described in 
the text. 




the Hawkes process, to see if this can account for the large exponent spread actually 
found in [1]. Comparison of their figure 4 with our figures [3] and H] shows such an 
explanation to be implausible: the exponent spread in the YouTube data is much too 
large, particularly for the subcritical case. We have also performed simulations with 
Nq = 500 and N = 50000, i.e., one order of magnitude in either direction from the 
results reported here. In studying the ensemble average of 700 such timeseries from 
these simulations, we find that we cannot reject, at the 95% significance level, the 
hypothesis that the data are the same as those we have reported for Nq = 5000. In 
other words, the size of the initial shock does not affect the statistics of the resulting 
timeseries. For the small initial shocks (N = 500), the spread of individually fitted 
exponents is indeed larger than those we show here with Nq = 5000 and vice versa for 
the larger shocks (Nq = 50000). This is simply due to the fitting method being more 
(less) affected by statistical fluctuations. The peaks of the exponent distributions do 
not, however, change appreciably. 
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Figure 4. Histograms of extracted decay exponents for critical and subcritical 
timcseries and P{6) = N(0A, 0.2), initial shock is 5000, 700 realisations. Grey 
histograms show exponents obtained from non-linear least squares fitting, white 
histograms are obtained from MLE fits. Note how there are two distinct peaks in 
the distribution, corresponding to critical and subcritical decays. The subcritical peak 
moves from /3 w 1 to j3 « 1.3 when we avoid picking up the early time critical decay 
described in the text. 



7.1. Classifying timeseries 

Crane and Sornette do not have a priori knowledge of which dynamic class each 
timeseries belongs to. Because the exponents do not fall into clear classes, they use 
a classification method based on the fraction of the total views that arise on the day 
of maximal viewing, termed the "peak fraction" (F). (This fraction is of course a 
measure of the steepness of the subsequent decay, hence of j3.) In our simulations, 
since we know (//), and hence which class any given timeseries is actually in, we can 
look at the peak fraction and see if this method allows for any misclassification. The 
classification according to F in [1] is to consider F > 0.8 as an exogenous subcritical 
decay, 0.2 < F < 0.8 as exogenous critical decay and F < 0.2 as endogenous critical 
decay. There are some further comments that the classification between the two 
exogenous cases is not significantly altered when varying the boundary between F = 0.7 
and F — > 1. We have not calibrated our simulations to any of Crane and Sornette's 
data, and hence do not know how long the time increment in our updates is relative 
to their data. The boundaries we choose for classification will therefore not have the 
same numerical values; this will not, however, invalidate our study of the classification 
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method. We find that our simulated timeseries show two well-defined peaks in the 
distribution of peak fractions. Choosing a cut-off of F < 0.2 to define exogenous critical 
and F > 0.2 to define exogenous subcritcal decay (recall we do not treat the endogenous 
case) results in no misclassification. 

Fraction of total viewing activity taken up by peak 

■ Critical, 8 from 5 function distribution 

□ Critical, 9 from Gaussian distribution 
n Subcritical, 8 from 5 function distribution 

□ Subcritical, 8 from Gaussian distribution 



0.0 0.1 0.2 0.3 0.4 0.5 

Peak fraction 

Figure 5. Distribution of peak fractions for critically and subcritically decaying 
timeseries. Parameter values as indicated in legend. Choosing a cutoff value of F = 0.2 
for the peak fraction would result in no misclassifications. 

Figure |5] shows a histogram of peak fractions of simulations with P(8) = 5(9 — 0.4) 
and with a Gaussian distribution. In both cases, there is an obvious divide between 
subcritically decaying timeseries and critical decays. For a suitably placed boundary 
between high and low peak fractions (F = 0.2), this method correctly classifies every 
single timeseries we have studied. 

8. Conclusions 

The observed behaviour of the Hawkes process subject to external shock is, for the case 
of a single- valued 9 distribution, exactly as predicted in Refs.[U E]. When 9 is drawn 
from a broad distribution, the numerical values of the decay exponent are modified, 
but the overall picture of critical and subcritical decays remains. Our results show a 
significant spread of fitted decay exponents, though much less than that seen in the 
YouTube data reported in [TJ. We can, however, shed some light on this. We have good 
control of all the timeseries we fit to, in particular, we ensure that they are all subject 
to the same size of fluctuation (by always studying the same size of shock). Crane 
and Sornette do not have this luxury. Our fitting to the tails of individual timeseries 
shows that the exponent can vary widely if the statistics are poor (in some instances 
the best-fit exponent is very different from that of the underlying distribution of which 
a given timeseries represents a single sample). It seems likely that some of the breadth 
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in the range of exponents seen in [I] is caused by considering many timeseries with 
poor statistics in the tail. By only considering timeseries with particularly large peaks 
(relative to the background viewing rate), a set of decay exponents with lower variance 
might be obtained. Of course, this would have a cost in terms of the overall statistics 
of the sample. 

In addition, our study shows that the peak fraction classification method is a 
good one and we suggest that carrying out this classification and then fitting to the 
ensemble average of suitably normalised timeseries may give the best estimate of the 
decay exponents. We have also shown that the least squares fitting method gives results 
that are not very different from the maximum likelihood approach favoured here. 

Our results demonstrate a way to test if 9 is really a unique global constant 
( equivalent ly, drawn from a ^-function distribution). The ensemble-averaged timeseries 
in this case are measurably different from those where 9 is broadly distributed with the 
same mean. Particularly, we observe a crossover effect in the critical decay for a broad 
9 distribution that is not present if 9 is constant. If the timeseries can be correctly 
classified using the peak-fraction method, an ensemble average of (suitably normalised) 
critical timeseries might be diagnostic of whether 9 is effectively constant or not. 

Finally, we reiterate that although our analysis of synthetic Hawkes process data 
results in a spread of fitted exponents within each dynamic class, this intrinsic noise does 
not fully account for the much wider distributions seen in the YouTube data of Crane 
and Sornette pp. This suggests limits to the universality concept which presumably 
underlies attempts to classify activity bursts in social systems into 'robust dynamic 
classes' pQ. While the Hawkes process is clearly useful in analysing real- world data 
from complex social systems, some fairly basic observables, such as the variance of the 
exponent distribution for individual activity bursts, are seemingly not captured by it. 
These aspects are thus either nonuniversal or lie in the universality class of a more 
complex model than Hawkes. 
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